System and process for searching a network

ABSTRACT

A system for searching a network for network based content related to a search query, such as multimedia and streaming media, includes an adapter for formatting the search query, a first database containing previous search results and a second database for storing currently returnable metadata, a search processor, and at least one search engine kernel comprising a search engine inherent database. The search engine coordinates searching of the first database and the second database, and and provides the formatted search query to the search engine kernel. The search processor also provides and receives search results to and from the first database and the second database, and provides search results to the adapter. The system stores a predetermined amount of previous search results in the first database, such that search results for a current search are retrieved from the database, avoiding a search through search engine kernel, comprising searchable metadata.

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims the benefit under 35 U.S.C. § 365 ofInternational Application PCT/US01/43247 filed Nov. 20, 2001, whichclaims the benefit of the U.S. Provisional application No. 60/252,273filed Nov. 21, 2000.

TECHNICAL FIELD

The field of this invention relates generally to computer relatedinformation search and retrieval, and more specifically to a robustsystem and process for searching for network-based content.

BACKGROUND

As background to understanding the invention, an aspect of the Internet(also referred to as the World Wide Web, or Web) contributing to itspopularity is the plethora of multimedia and streaming media filesavailable to users. However, finding a specific multimedia or streamingmedia file buried among the millions of files on the Web is often anextremely difficult task. The volume and variety of informationalcontent available on the web is likely to continue to increase at arather substantial pace. This growth, combined with the highlydecentralized nature of the web, creates substantial difficulty inlocating particular informational content.

Streaming media refers to audio, video, multimedia, textual, andinteractive data files that are delivered to a user's computer via theInternet or other network environment and begin to play on the user'scomputer before delivery of the entire file is completed. One advantageof streaming media is that streaming media files begin to play beforethe entire file is downloaded, saving users the long wait typicallyassociated with downloading the entire file. Digitally recorded music,movies, trailers, news reports, radio broadcasts and live events haveall contributed to an increase in streaming content on the Web. Inaddition, less expensive high-bandwidth connections such as cable, DSLand T1 are providing Internet users with speedier, more reliable accessto streaming media content from news organizations, Hollywood studios,independent producers, record labels and even home users.

A user typically searches for specific information on the Internet via asearch engine. A search engine comprises a set of programs accessible ata network site within a network, for example a local area network (LAN)or the Internet and World Wide Web. One program, called a “robot” or“spider”, pre-traverses a network in search of documents (e.g., webpages) and other programs, and builds large index files of keywordsfound in the documents. Typically, a user formulates a query comprisingone or more search terms and submits the query to another program of thesearch engine. In response, the search engine inspects its own indexfiles and displays a list of documents that match the search query,typically as hyperlinks. The user may then activate one of thehyperlinks to see the information contained in the document.

Conventional search engines, however, have drawbacks. For example, manytypical search engines are oriented to discover textual informationonly. In particular, they are not well suited for indexing informationcontained in structured databases (e.g. relational databases), voicerelated information, audio related information, multimedia, andstreaming media, etc. Also, mixing data from incompatible data sourcesis difficult for conventional search engines.

Furthermore, many conventional search engine systems are neither robustenough nor scalable enough to provide a user with search results, andupdate its databases quickly, regardless of the search query. Manysearch engine systems comprise software elements that reside on specificprocessors, wherein the software elements are not portable. That is, thesoftware elements cannot be downloaded to another processor inaccordance with demand. Also, many of the software elements are vendorspecific, wherein the search engine system cannot accommodate softwareproviding similar functionality by another vendor. In the case wheresoftware elements may be installed on several processors concurrently toprocess large amounts of data, many systems are not scalable, in thatthe number of processors utilized cannot be increased or decreased inaccordance with demand. Thus, there is a need for a search system thatis not limited by the previously described drawbacks and disadvantages.

SUMMARY

The invention describes a system for searching a network fornetwork-based content related to a search query includes an adapter forformatting a search query. The system also includes a first databasecomprising previous search results and a second database for storingcurrent search results. Also included are at least one search engine forsearching search engine inherent databases for content related to thesearch query, and a search processor. The search processor coordinatessearching of the first database and said at least one search engine, andprovides the formatted search query to said at least one search engine.The search processor also provides and receives search results to andfrom the first database and the second database, and provides searchresults to the adapter.

A method for searching a network for network based content related to asearch query, includes receiving the search query, formatting the searchquery, and searching a database for the network based content related tothe search query. The database comprises previous search results. If nonetwork based content related to the search query is found in thedatabase, the formatted search query is provided to at least one searchengine. Search results are retrieved from the database or the at leastone search engine and the retrieved search results are formatted.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a stylized overview illustration of a system of interconnectedcomputer system networks;

FIG. 2 is a functional block diagram of a search system in accordancewith the present invention;

FIG. 3 is a functional block diagram of a search system comprising aplurality of search processors in accordance with the present invention;and

FIG. 4 is a flow diagram of a process for searching network-basedcontent in accordance with the present invention.

DETAILED DESCRIPTION

The Internet is a worldwide system of computer networks, that is anetwork of networks in which users at one computer can obtaininformation from any other computer and communicate with users of othercomputers. The most widely used part of the Internet is the World WideWeb (often-abbreviated “WWW” or called “the Web”). An outstandingfeature of the Web is its use of hypertext, which is a method ofcross-referencing. In most Web sites, certain words or phrases appear intext of a different color than the surrounding text. This text is oftenalso underlined. Sometimes, there are buttons, images or portions ofimages that are “clickable.” Using the Web provides access to millionsof pages of information. Web “surfing” is done with a Web browser; suchas NETSCAPE NAVIGATOR® and MICROSOFT INTERNET EXPLORER®. The appearanceof a particular website may vary slightly depending on the particularbrowser used. Recent versions of browsers have “plug-ins,” which provideanimation, virtual reality, sound and music.

The present invention is a system and method for retrievingnetwork-based content, including media files and data related to mediafiles, on a computer network via a search system utilizing metadata. Asused herein, the term “media file” includes audio, video, textual,multimedia data files, and streaming media files. Multimedia filescomprise any combination of text, image, video, and audio data.Streaming media comprises audio, video, multimedia, textual, andinteractive data files that are delivered to a user's computer via theInternet or other communications network environment and begin to playon the user's computer/device before delivery of the entire file iscompleted. One advantage of streaming media is that streaming mediafiles begin to play before the entire file is downloaded, saving usersthe long wait typically associated with downloading the entire file.Digitally recorded music, movies, trailers, news reports, radiobroadcasts and live events have all contributed to an increase instreaming content on the Web. In addition, the reduction in cost ofcommunications networks through the use of high-bandwidth connectionssuch as cable, DSL, T1 lines and wireless networks (e.g., 2.5G or 3Gbased cellular networks) are providing Internet users with speedier,more reliable access to streaming media content from news organizations,Hollywood studios, independent producers, record labels and even homeusers themselves.

Examples of streaming media include songs, political speeches, newsbroadcasts, movie trailers, live broadcasts, radio broadcasts, financialconference calls, live concerts, web-cam footage, and other specialevents. Streaming media is encoded in various formats includingREALAUDIO®, REALVIDEO®, REALMEDIA®, APPLE QUICKTIME®, MICROSOFT WINDOWS®MEDIA FORMAT, QUICKTIME®, MPEG-2 LAYER III AUDIO, and MP3®. Typically,media files are designated with extensions (suffixes) indicatingcompatibility with specific formats. For example, media files (e.g.,audio and video files) ending in one of the extensions, ram, .rm, .rpm,are compatible with the REALMEDIA® format. Some examples of fileextensions and their compatible formats are listed in the followingtable. A more exhaustive list of media types, extensions and compatibleformats may be found at http://www.bowers.cc/extensions2.htm.

TABLE 1 Format Extension REALMEDIA ® .ram, .rm, .rpm APPLE QUICKTIME ®.mov, .qif MICROSOFT .wma, .cmr, .avi WINDOWS ® MEDIA PLAYER MACROMEDIAFLASH .swf, .swl MPEG .mpg, .mpa, .mp1, .mp2 MPEG-2 LAYER III .mp3,.m3a, .m3u Audio

Metadata as descriptive data literally means “data about data.” Metadatais data that comprises information that describes the contents orattributes of other data (e.g., media file). For example, a documententitled, “Dublin Core Metadata for Resource Discovery,”(http://www.ietf.org/rfc/rfc2413.txt) separates metadata into threegroups, which roughly indicate the class or scope of informationcontained therein. These three groups are: (1) elements relatedprimarily to the content of the resource, (2) elements related primarilyto the resource when viewed as intellectual property, and (3) elementsrelated primarily to the instantiation of the resource. Examples ofmetadata falling into these groups are shown in the following table.

TABLE 2 Intellectual Content Property Instantiation Title Creator DateSubject Publisher Format Description Contributor Identifier Type RightsLanguage Source Relation Coverage

Sources of metadata include web page content, uniform resource locators(URLs), media files, and transport streams used to transmit media files.Web page content includes HTML, XML, metatags, and any other text on theweb page. As explained in more detail, herein, metadata may also beobtained from the URLs the web page, media files, and other metadata.Metadata within the media file may include information contained in themedia file, such as in a header or trailer, of a multimedia or streamingfile, for example. Metadata may also be obtained from the media/metadatatransport stream, such as TCP/IP (e.g., packets), ATM, frame relay,cellular based transport schemes (e.g., cellular based telephoneschemes), MPEG transport, HDTV broadcast, and wireless based transport,for example. Metadata may also be transmitted in a stream in parallel oras part of the stream used to transmit a media file (a High Definitiontelevision broadcast is transmitted on one stream and metadata, in theform of an electronic programming guide, is transmitted on a secondstream).

Referring to FIG. 1 there is shown a stylized overview of a system 100of interconnected computer system networks 102 and 112. Each computersystem network 102 and 112 contains at least one corresponding localcomputer processor unit 104 (e.g., server), which is coupled to at leastone corresponding local data storage unit 106 (e.g., database), andlocal network users 108. A computer system network, as a communicationsnetwork, may be a local area network (LAN) 102 or a wide area network(WAN) 112, for example. The local computer processor units 104 areselectively coupled to a plurality of media devices 110 through thenetwork (e.g., Internet) 114. Each of the plurality of local computerprocessors 104, the network user processors 108, and/or the mediadevices 110 may have various devices connected to its local computersystems, such as scanners, bar code readers, printers, and otherinterface devices. A local computer processor 104, network userprocessor 108, and/or media device 110, programmed with a Web browser,locates and selects (e.g., by clicking with a mouse) a particular Webpage, the content of which is located on the local data storage unit 106of a computer system network 102, 112, in order to access the content ofthe Web page. The Web page may contain links to other computer systemsand other Web pages.

The local computer processor 104, the network user processor 108, and/orthe media device 110 may be a computer terminal, a pager which cancommunicate through the Internet using the Internet Protocol (IP), aKiosk with Internet access, a connected electronic planner (e.g., a PALMdevice manufactured by Palm, Inc.) or other device capable ofinteractive communication through a network, such as an electronicpersonal planner. The local computer processor 104, the network userprocessor 108, and/or the media device 110 may also be a wirelessdevice, such as a hand held unit (e.g., cellular telephone) thatconnects to and communicates through the Internet using the wirelessaccess protocol (WAP). Networks 102 and 112 may be connected to thenetwork 114 by a modem connection, a Local Area Network (LAN), cablemodem, digital subscriber line (DSL), twisted pair, wireless basedinterface (cellular, infrared, radio waves), or equivalent connectionutilizing data signals. Databases 106 may be connected to the localcomputer processor units 104 by any means known in the art. Databases106 may take the form of any appropriate type of memory (e.g., magnetic,optical, etc.). Databases 106 may be external memory or located withinthe local computer processor 104, the network user processor 108, and/orthe media device 110.

Computers may also encompass computers embedded within consumer productsand other computers. For example, an embodiment of the present inventionmay comprise computers (as a processor) embedded within a television, aset top box, an audio/video receiver, a CD player, a VCR, a DVD player,a multimedia enable device (e.g., telephone), and an Internet enableddevice.

In an exemplary embodiment of the invention, the network user processors108 and/or media devices 110 include one or more program modules and oneor more databases that allow the user processors 108 and/or mediadevices 110 to communicate with the local processor 104, and each other,over the network 114. The program module(s) include program code,written in PERL, Extensible Markup Language (XML), Java, HypertextMark-up Language (HTML), or any other equivalent language which allowsthe network user processors 108 to access the program module(s) of thelocal processors 104 through the browser programs stored on the networkuser processors 108.

Web sites and web pages are locations on a network, such as theInternet, where information (content) resides. A web site may comprise asingle or several web pages. A web page is identified by a UniformResource Identifier (URI) comprising the location (address) of the webpage on the network. Web sites, and web pages, may be located on localarea network 102, wide area network 112, network 114, processing units(e.g., servers) 104, user processors 108, and/or media devices 110.Information, or content, may be stored in any storage device, such as ahard drive, compact disc, and mainframe device, for example. Content maybe stored in various formats, which may differ, from web site to website, and even from web page to web page.

FIG. 2 is a functional block diagram 200 of a robust system forsearching a network in accordance with the present invention. System 200comprises several functional elements including an adapter 12, a searchprocessor 14, data stores for search results 16, a query cache 18,search engine kernels 20, search persistent store 22, and promoter 24.In one embodiment of the invention, each of the functional elements ofsystem 200 is implemented on a plurality of processors, which may bedynamically modified in accordance with the demand being placed on thesystem 200. For example, each functional element in system 200 mayreside on a separate processing unit, wherein additional processingunits are brought on line to help process any particular function anddeactivated when the demand decreases. In another exemplary embodiment,all functional elements of system 200 reside on a single processingunit, wherein software code segments and memory are associated with eachfunctional element. The amount of the single processor unit's resourcesavailable for a particular functional element is dynamic, and allocatedaccording to the demand placed on the single processing unit for aparticular function.

System 200 comprises the characteristics of separability (severability)and scalability. Separability refers to the functional elements ofsystem 200 be completely portable, and replaceable. That is eachfunctional element may reside on any processing unit and any functionalelement may be replaced by an updated version, or another vendor'sversion, of the functional element. Separability is ensured byimplementing inter-element specific interface protocols. Theinter-element interfaces, referred to as application programminginterfaces (APIs), allow functional elements to communicate with eachother, regardless of the version or vendor of the functional element.APIs are known in the art. An API is a set of predetermined, re-usableprotocols. For example, to create a API for searching an interface isprovided with a method search with a set of defined parameters (e.g., aquery string, a string of desired bit rates, a string of desiredsystems), which all systems must honor to conform to the API.

Scalability refers to the system 200 being capable of reallocatingsystem resources to meet specific functional element demand. Forexample, system 200 increases or decreases memory available to aspecific functional element, such as query cache 18, in accordance withthe amount of memory needed by that functional element. Thus, if querycache 18 requires more memory, system 200 makes more memory available toquery cache 18. As data is removed from query cache 18, the unusedmemory is made available to other functional elements of the system 200.Currently reallocation requires restarting one of the said componentsafter modifying configuration settings.

Adapter 12 is a functional element for translating and formatting searchqueries into a system format usable by system 200. Adapter 12 translatesa query, such as a user submitted search query, from a standardprotocol, such as hypertext transfer protocol (HTTP) into a system 200specific format, such as extensible markup language (XML) in accordancewith the schemes required by the search engine kernels 20. Specificsearch engines often require data to be provided in that search engine'sspecific format of XML. Thus, adapter 12 translates and formats searchqueries to each search engines specific format. Adapter 12 also formatsthe search results from the system format to the submitted format orrequested formats, such as hypertext markup language (HMTL) and XML.

HTTP is the protocol most commonly used by processors on the Internet tocommunicate with each other. An HTTP transaction typically comprises arequest sent by one processor to another processor, and a responsereturned. HTTP requests and responses include a message header,describing the message. XML is a language, which describes network(e.g., Internet) data and its structure, in contrast to HTML, whichdescribes how data should be presented. XML provides a user the abilityto create her own vocabulary to describe information. With this ability,an XML document can be designed to fit specific purposes, which is notpossible with HTML. Thus, it is not uncommon for many search engines tocreate search engine specific XML code for provided data.

Search processor 14 is a functional element for coordinating thesearching process performed by the system 200. Search processor 14ensures that a search query is properly translated to the system formatand that search results are translated into the proper format (e.g.,user-provided format, user-specified format). Search processor 14 alsoensures that queries are searched for in the appropriate database.

Query cache 18 is a functional element comprising copies ofsearch-engine results, such as data identifiers and scores related toand/or from a number of previous searches, although the query cache 18may accommodate other forms of data related to prior searches. Querycache 18 may comprise any processor, code segment, storage device,database, or a combination thereof capable of storing search results andcommunicating the same with the search processor 14 As search queriesare provided to the system 200, results of the searches based upon thesequeries are stored in the query cache 18. If the search results for thepresent search query are stored in the query cache 18, the searchresults are retrieved by search processor 14 directly from query cache18, without accessing the search engine kernel 20. The recordidentifiers returned from query cache 18 or search engine kernel 20 arethen combined with the displayable data from the results data store 16.These combined results are then provided to the user or requestingsystem via adapter 12. Search results stored in query cache 18 areupdated in accordance with a process called LRU (least recently used).In accordance with the LRU process, the most recent search resultsreplace the search results that have resided in query cache 18 thelongest. That is, the most recent search results replace the oldestsearch results. Thus, the amount of memory (size) contained in the querycache 18 remains approximately constant, within limits. However, thesize of query cache 18 may be increased or decreased in accordance withthe demand placed on query cache 18. In one embodiment, the size isconfigurable as a startup parameter and changing the size of the cacherequires rebooting the query cache (18) only. The system recognizes thatthe query cache has restarted and carries on normally. Furthermore,search results stored in the query cache 18 are deleted (i.e., removedfrom query cache 18) if they are not accessed or replaced within apredetermined amount of time. For example, in order to provide timelyresults for items such as news, the items are “aged out” afterapproximately 30 minutes. However, this parameter is configurable, andmay be set to any desired value.

Query cache 18, in an alternative embodiment, supports results paging.The result from a query typically includes all the hits corresponding toa search query. In this embodiment, search processor 14, when retrievingdata from the query cache 18, only receives a subset of data necessaryto satisfy a request for a currently requested page of data(corresponding to a first displayed page, for example), formatted byadapter 12. An additional subset of data (corresponding to a seconddisplayed page, for example) is sent from query cache 18, when searchprocessor 14 requests an additional page of search results formatted byadapter 12. For instance, immediately after a query is run, only thefirst page of search results is returned to adapter 12 for formatting.If a user wished to see a second page, the same query is passed throughthe system again, but the search request is only for the rows of datacorresponding to a second displayed page.

Search engine kernels 20 are functional elements for providing thesearch mechanism, wherein databases are searched for the search queryand search query related data. The databases, searched by the searchengine kernels 20, comprise content resulting from agents, such asspiders and robots, searching a network (e.g., the Internet). The searchengine kernel 20 may be any appropriate search engine kernel known inthe art. Examples of search engine kernels include Oracle™- iMT™,AltaVista™, and InfoSeek™. The severability of system 200 through theuse of APIs allows any search engine kernel to be modified to a newerversion, replaced with another vendor's version, replaced with adifferent search engine kernel, or a combination thereof, withoutdisabling system 200. Thus system 200 is not dependent upon one specifictype of search engine kernel. Although system 200 is depicted in FIG. 2as comprising a plurality of search engine kernels 20, system 200 maycomprise a single search engine kernel 20 in accordance with the presentinvention.

Result data stores 16 are functional elements for storing metadataassociated with every item stored in the search engine kernel and thesearch persistent store. Data Identifiers, such as primary keys, foundthrough the use of search engine kernels 20, are stored in results datastores 16 for subsequent provision to a user or requesting system.Search results comprise any returnable metadata known for each stream.Examples include title, URL, author, bit rate, and system. Tables 1 and2 contain metadata for three different items (three unique filenames).The result key is a unique identifier for indexing into the metadatastored in the result store. The score is a numeric weighting computedfor a specific query for the particular result key. This numericweighting deals with term frequency, date relevancy, and other relevancyrequirements to arrive at a single weighted score for each query foreach row. Each result data store 16 may comprise any processor, codesegment, storage device, database, or a combination thereof capable ofstoring search results and communicating the same with the searchprocessor 14. Although system 200 is depicted in FIG. 2 as comprising aplurality of results data stores 16, system 200 may comprise a singleresults data store 16 in accordance with the present invention.

Search persistent store 22 is a functional element for storing the mostrecent view of the metadata in order to update the search engine kernels20 and providing search metadata to the results data stores 16. Searchpersistent store 22 may comprise any processor, code segment, storagedevice, database, or a combination thereof capable of storing searchresults, providing search results to results data stores 16, andupdating search engine kernels 20. The search persistent store 22 storesa full version of the metadata (both searchable and returnable) forevery stream. The search engine kernel 20 contains only searchablemetadata (which it gets from the search persistent store) while theresults data store 16 retrieves returnable metadata. Both the searchengine kernel 20 and the results data store 16 are updated by the searchpersistent store 22 distributing results to each corresponding system.

Promoter 24 is a functional element for updating the intermediatemetadata stored in search persistent store 22 with the most recentversion of metadata known for the given file/stream. This recentmetadata is then provided to the search engine kernel 20 and the resultsdata store 16 in a timely manner to provide a view of the metadata as itevolves. This mechanism provides a means of updating the metadata at afast rate and a means to provide the metadata to the results data store16 and search engine kernels 20 on a periodic timeline, as a processingload allows. The search persistent store/promoter is typically sharedbetween monoliths at geographically similar locations. These mechanismsare the master source of metadata for updating the searchable view fromthe search engine kernels 20 and results data store 16. Thus providing asearch system that is reliable and maintainable. Promotion takes updatedmetadata from the workflow system and updates the system as new data arediscovered and current data are updated. The search engine kernels 20and results data stores 16 grab updated content from the searchpersistent store 22 at a configurable interval to update their view ofthe metadata, such as shown in Tables 3 and 4.

In an alternative embodiment of the invention, promoter 24 functionswith two subsystems: one for data-acquisition, and the other for movingdata between databases and search clients. The first subsystem for dataacquisition acquires metadata from sources connected on the Internetthrough data extractors well known in the art, as spiders. Thiscollected metadata is then moved into search persistent store 22 by thepromoter's first subsystem. The second subsystem called the“distributor” moves data (including some of the collected metadata) fromthe search persistent store 22 to clients as search engines, searchengine kernels 20, results data stores 16, and other search persistentstores 22 that are geographically remote from the search persistentstore 22.

TABLE 3 File Name: ALL_LOVE.WM YELLOW_SUB.RM Title: All You Need YellowIs Love Submarine Artist: Beatles Beatles Album: Yellow Yellow SubmarineSubmarine Copyright 1969 1969 Date: Format: Microsoft RealMediaMediaPlayer Playback 250 KB 30 KB Rate:

TABLE 4 File Name: YELLOW_SUB.RM YELLOW_MOV.RM Title: Yellow SubmarineYellow Submarine Artist: Beatles Album: Yellow Submarine Actor: JohnLennon Genre: Musical Copyright 1969 1969 Date: Format: RealMediaRealMedia Playback 30 KB 250 KB Rate:

FIG. 3 is a functional block diagram of a system 300 in accordance withthe present invention comprising a plurality of search processors 14 anda load balancer 28. As can be seen in FIG. 3, system 300 comprises twosubsystems 32 sharing a common query cache 18. Each subsystem 32operates in a manner similar to system 200. In accordance with thedemand being placed on system 300, load balancer 28 distributes theprocessing load approximately evenly between the subsystems.Furthermore, if one subsystem becomes inoperative, the load balancer 28dynamically allocates the workflow for the inoperative subsystem toanother subsystem. Although system 300 is depicted as having twosubsystems 32, system 300 may comprise more than two subsystems 32sharing a common query cache 18, to accommodate the demand being placedon the system 300. Optionally, system 300 has the two subsystems 32sharing data between their respective search persistent stores 22 andpromoters 24.

FIG. 4 is a flow diagram of a process for searching network-basedcontent in accordance with the present invention. Adapter 12, at step42, receives the search query. The search query may be provided by auser, a requesting system, or by both query providers. The adapter 12translates and/or formats the search query from a standard protocol(e.g., HTTP) to a system specific format (e.g., XML) at step 44. At step48, the query cache 18 is searched for content relating to the searchquery. If content related to the search query is found in the querycache 18, the search results comprising that content are retrieved fromthe query cache 18 at step 46. Search engine kernels 20 are not searchedif search results are obtained from the query cache 18. Thus, by notemploying a search engine to search for content related to the searchquery, the system and process provide a very quick and efficient meansfor providing the search results to a user and/or requesting system. Afile history is updated with the information pertaining to the searchresults retrieved from the query cache 18 at step 56. The information inthis history file is used to update the query cache 18. Data for eachkey/score returned from the query cache 18 and/or search engine kernel20 are combined with the returnable metadata from the results data store16. The search results from the query cache 18 (keys and scores) and thereturnable metadata (from the results data store 16 ), fetched at step57, are merged at step 65 forming the merger of search engine queryresults with related data store results. The search results are thenformatted to conform to the format in which the search query wasoriginally provided, or a specifically requested format, at step 58. Theformatted search results are then provided to a user and/or systemthrough the results data store 16, search processor 14 and adapter 12,at step 60.

In various embodiments of the search system, the query cache 18 may beshared among co-located entities or monoliths, or be contained withinone monolith, wherein each monolith comprises a query cache 18. Thisflexibility also applicable to the results data stores 16. That is theresults data store 16 may be shared among co-located monoliths, or becontained within one monolith, wherein each monolith comprises a resultsdata store 16.

If no content related to the search query is found in query cache 18,the formatted search query is provided to the search engine kernels 20,at step 50. The search engine kernels search databases comprisingsearchable metadata, which are inherent to each search engine. The querymay also involve multiple search engines with their corresponding searchengine kernel databases being the metadata searched against. Theseinherent databases may comprise the results of network searchesconducted by agents, such as spiders and robots. Results are obtained bysearching the search kernels 20 known subset, at step 52, yieldingsearch engine result(s) that are merged, if the results come frommultiple search engines. The search persistent store 22 is a centralcache of all data coming from promotion that is used to update thesearch engine kernel 20 and the results data store 16. The searchpersistent store 22 may be co-located with the monoliths, or ageographically separated monolith may have its own search persistentstore 22, which is synched via promotion. The query cache 18 is updated(for example, adding, changing, or deleting) with the informationpertaining to the search results obtained from the search enginesearches at step 54. Accordingly, if the current search query isprovided to the system again, the system will retrieve search resultsfrom the query cache 18, rather than employing the time consuming searchengine kernels 20 again. The history file is updated to with theinformation pertaining to the search results retrieved from the searchengine kernels 20 at step 56. The information in this history file isused to update the query cache 18. The search results from the querycache 18 (keys and scores) and the returnable metadata (from the resultsdata store 16), fetched at step 57, are merged at step 65 forming themerger of search engine query results with related data store results.The search results are then formatted to conform to the format in whichthe search query was originally provided, or a specifically requestedformat, at step 58. The search results obtained from the query cache 18are then provided to a user and/or system through the results data store16, search processor 14 and adapter 12, at step 60.

The translation and formatting performed at step 58 comprises formattingto extract search query specific content (for example streaming mediafiles) from the intermediate search results stored in the search enginekernel 20, and formatting the search results to comply with the userprovided or specified format by adapter 12.

It is noted that while some embodiments of system 200 operate with asingle processor, the invention also operates efficiently withdeployment over multiple monoliths shown as system 32 in FIG. 3 (eachwith their own search processor 14), which have search subsystems thatmay be shared. For example, two systems 32 are coupled together to formsystem 300. The number of search processors for system 300 is two, butin this alternative embodiment of the invention, there is a singlesearch persistent store 22 and a single promoter 24 shared between eachsystem 32. This doubling of subsystems may double the number of queriesper minute the system 300 in FIG. 3 yielding returnable metadata, whichhas little impact on the metadata that may be searched. The inventionalso accommodates other permutations of scaleable deployment, forexample, two search persistent stores 22 may be shared by three system32, based upon geographic or bandwidth concerns. Additionally, multiplesearch engine kernels 20 within system 32 may be added to furtherincrease the volume of databases that may be queried for a search(searchable metadata). In essence, the more systems 32 coupled together,and search engines kernels 20 added within each system 32; the moresearch queries, searchable metadata, and returnable metadata may beaccommodated within the described invention.

A system and process for searching a network in accordance with thepresent invention provide robustness, separability, scalability,efficiency, and quickness. These characteristics are provided by asystem comprising functional elements having defined application programinterfaces (APIs) to each of the other functional elements. Thus, achange in version or vendor source of a functional element will haveminimal impact on the system. Further, the system is dynamicallyreconfigurable to meet the processing and memory demands being place onthe system. No one functional element need reside on a specific hardwaredevice, thus providing reconfigurability comparable to a distributedarchitecture. Also, the system stores a predetermined amount of previoussearch results in a cache memory, such that search results for a currentsearch are retrieved from that cache, thus avoiding the time consumingprocess of employing a search engine to search the network.

The present invention may be embodied in the form ofcomputer-implemented processes and apparatus for practicing thoseprocesses. The present invention may also be embodied in the form ofcomputer program code embodied in tangible media, such as floppydiskettes, read only memories (ROMs), CD-ROMs, hard drives, high densitydisk, or any other computer-readable storage medium, wherein, when thecomputer program code is loaded into and executed by a computer, thecomputer becomes an apparatus for practicing the invention. The presentinvention may also be embodied in the form of computer program code oran electronic signal, for example, whether stored in a storage medium,loaded into and/or executed by a computer, or transmitted over sometransmission medium, such as over electrical wiring or cabling, throughfiber optics, or via electromagnetic radiation, wherein, when thecomputer program code is loaded into and executed by a computer, thecomputer becomes an apparatus for practicing the invention. Whenimplemented on a general-purpose processor, the computer program codesegments configure the processor to create specific logic circuits.

1. A method in a computer system for searching a network for networkbased content related to a search query, said computer system comprisinga storage media and a processor, said method comprising the steps of:receiving said search query; formatting said search query; searching afirst cache database for network based content related to said formattedsearch query, said first cache database comprising previous searchresults; if no network based content related to said formatted searchquery is found in said first cache database, providing said formattedsearch query to at least one search engine, wherein said formattedsearch query is in a format compatible with said at least one searchengine; receiving network based content related to said formatted searchquery from the search engine; and updating said first cache databasewith the received network based content from the search engine;retrieving search results from said first cache database, wherein saidsearch results are related to said formatted search query; merging saidsearch results from said first cache database with returnable metadatafrom a second database, said second database comprising returnablemetadata related to said previous search results generated fromsearchable metadata, wherein said returnable metadata from said seconddatabase is related to said search results from said first cachedatabase; and formatting said merged search results related to saidsearch results from said first cache database and said returnablemetadata from said second database to one of a user provided format anda requested format.
 2. A method in accordance with claim 1, furthercomprising the step of updating a history file comprising searchresults.
 3. A method in accordance with claim 1, further comprising thestep of updating said first cache database to comprise most recentsearch results, wherein: said most recent search results replace leastrecent search results; and search results residing in said first cachedatabase for at least a predetermined amount of time are removed fromsaid first cache database.
 4. A method in accordance with claim 1,wherein said search query comprises at least one of multimedia andstreaming media.
 5. A system for searching a network for network basedcontent related to a search query, said system comprising: a processor;an adapter for receiving and formatting said search query to a formatcompatible with at least one search engine; a first cache databasecomprising search results from previous search queries; a seconddatabase comprising returnable metadata related to said search resultsgenerated from searchable metadata; and a search engine for:coordinating searching of said first cache database; coordinatingsearching of said second database; coordinating searching of a thirddatabase; searching said first cache database for network based contentrelated to said formatted search query; if no search results related tosaid formatted search query are found in said first cache database,searching said third database for search results related to saidformatted search query; and updating said first cache database with saidsearch results from searching said third database; retrieving searchresults from said first cache database, wherein said search results fromsaid first cache database are related to said formatted search query;merging said search results from said first cache database withreturnable metadata from said second database, wherein said returnablemetadata from said second database is related to said search resultsfrom said first cache database; and providing said merged search resultsto said adapter, wherein said adapter formats said merged search resultsto one of a user provided format and a requested format.
 6. A system inaccordance with claim 5 further comprising a promoter for modifying datacollected for use in a future search query.
 7. A system in accordancewith claim 6, wherein said modifying comprises at least one of:optimizing a format of said collected data for supporting said searchengine, optimizing a selection of said collected data for supportingsaid search engine, and producing said collected data for conversioninto a displayable format.
 8. A system in accordance with claim 7,wherein said format of said collected data for supporting said searchengine and said displayable format are different.
 9. A system inaccordance with claim 5, further comprising a search persistent databasefor storing data formatted for use by said search engine and said seconddatabase.
 10. A system in accordance with claim 5, further comprising aplurality of adapters; a respective plurality of search engines; and aload balancer for approximately evenly distributing a processing loadamong each of said plurality of adapters and said respective searchengines.
 11. A system in accordance with claim 5, wherein communicationbetween said adapter, said search engine, said first cache database andsaid second database is in accordance with a system specific applicationprogramming interface protocol.
 12. A system in accordance with claim 5,wherein said search query comprises at least one of multimedia andstreaming media.
 13. A system in accordance with claim 5, wherein saidthird database comprises said searchable metadata.
 14. A system inaccordance with claim 5, wherein said first cache database comprises atleast one of: a key generated from prior search query corresponding tosaid metadata in said second database, and a score generated from priorsearch query corresponding to said metadata in said second database. 15.A system in accordance with claim 5, wherein said search engine returnsa subset of data from said merged search results corresponding to saidformatted search query to said adapter for formatting, and saidformatting is for generating a displayed page from a plurality ofdisplayed pages corresponding to said merged search resultscorresponding to said formatted search query.
 16. A computer-readablestorage medium having embodied thereon a program for causing a processorto search a network for network based content related to a search query,said program comprising: means for causing said processor to receivesaid search query; means for causing said processor to format saidsearch query; means for causing said processor to search a first cachedatabase for network based content related to said formatted searchquery, said first cache database comprising previous search results; ifno network based content related to said formatted search query is foundin said first cache database, means for causing said processor toprovide said formatted search query to at least one search engine,wherein said formatted search query is in a format compatible with saidat least one search engine; means for causing said processor to receivenetwork based content related to said formatted search query from thesearch engine; and means for causing said processor to update said firstcache database with the received network based content from the searchengine; means for causing said processor to retrieve search results fromsaid first cache database, wherein said search results are related tosaid formatted search query; means for causing said processor to mergesaid search results from said first cache database with returnablemetadata from a second database, said second database comprisingreturnable metadata related to said previous search results generatedfrom searchable metadata, wherein said returnable metadata from saidsecond database is related to said search results from said first cachedatabase; and means for causing said processor to format said mergedsearch results related to said search results from said first cachedatabase and said returnable metadata from said second database to oneof a user provided format and a requested format.
 17. Thecomputer-readable storage medium in accordance with claim 16, furthercomprising means for causing said processor to update a history filecomprising search results.
 18. The computer-readable storage medium inaccordance with claim 16, further comprising means for causing saidprocessor to update said first cache database to comprise most recentsearch results, wherein: said most recent search results replace leastrecent search results; and search results residing in said first cachedatabase for at least a predetermined amount of time are removed fromsaid first cache database.
 19. The computer-readable storage medium inaccordance with claim 16, wherein said search query comprises at leastone of multimedia and streaming media.